Abusing the internet with popular search engine technologies by c0ntex | c0ntexb[at]gmail.com ------------------------------------------------------------------------------------------ When you think of a search engine, you probably conjure up an image of the handy, html based site that can help you find those RFC's, wiring diagrams, soft-warez, serial keys or questionable images, depending on your taste. You've probably used generic search engines like Google, Yahoo, MSN, Lycos & Hotbot a trillion times without considering even the possibility that it could be used to support an attack attempt. Due to the sheer volume of HTTP based attacks, one has to consider than anything utilising HTTP is a serious theoretical attack vector. Too many people feel happy in the knowledge that only HTTP holes are punched in the firewall, if the server daemon is patched and user input locations are verified as safe then there is no need to worry. This is a serious misconception that one should avoid. HTTP fingerprinting, SQL Injection, malformed header injections, XXS, cookie fun, every conceivable form of web based application or service attack can be managed by your security infrastructure. Yet what is the use in having any of this preventative security to protect your data from prying eyes, when you have allowed search bots to come in happily and crawl all web servers. Taken from the Berkely teaching guide, stating that Spiders are: "Computer robot programs, referred to sometimes as "crawlers" or"knowledge-bots" or "knowbots" that are used by search engines to roam the World Wide Web via the Internet, visit sites and databases, and keep the search engine database of web pages up to date. They obtain new pages, update known pages, and delete obsolete ones. Their findings are then integrated into the "home" database." When any powerful technology is utilised by a curious mind, possessing advanced knowledge of the internal workings, it becomes easy to transform the technology into a potentially malicious weapon. In this case, the engine soon becomes an advanced data-gathering tool for attackers. Everyone knows about RFP's web mining tool called Whisker, yet not everyone is aware that a search engine can do the same...and more. Google is the most powerful search engine online and it probably has been for the last four years or more, crawling literally millions of websites and public archives daily. The amount of data that it contains would crush any reasonable human mind due to the sheer scale. When data is being stored on such a vast scale and the manner in which it's gathered by engines, it is of no wonder that sometimes-sensitive information will be included. The spiders that Google use to harvest all these sites have no mind, they are merely an axon connection to a central soma, acting like a neural network. The data is filtered down the axon if it passes a simple logic test, the soma then sends it into the central brain. The only energy required to sustain crawling is a diet similar to that of the pecking pigeons which perform the ranking inside the database clusters. http://www.google.com/technology/pigeonrank.html When testing a company site or domain, it is always useful to use Google as it's crawling software is superb and will find some pretty interesting stuff you might otherwise overlook. It also has a very nice cache function that will allow you to see "old" pages that were once available from that site. Attackers can use this feature to compare pages, documents, structures and the likes. Some fictional public BBS sits in your DMZ, using flat text and CGI. The BBS contains a plain text or md5 encrypted password file that sits in the web directory and Mr Security forgets about it. One week later, a search engine spider creeps along, crawls your site and just happens to find the password text. 1 month later we come along and search your domain for password.txt and guess what pops up. Interestingly, it seems a user from the internal network has subscribed to the BBS, using the same password as his domain password. Now things become more interesting, especially when you find out after a very authentic sounding phone call to his office receptionist, that the user is a member of IT staff. IT: domains, routers, switches, servers, databases. Human: laziness, imperfection, same password everywhere, domain admin account?. It's possible to block spiders by using the robots file, providing the spider can read. An example robots.txt file looks like: User-agent: * # This shoo's all spiders off with a big Disallow: / # stick, no flies around this server: User-agent: googlebot # Fend off googlebot: Disallow: /cgi-bin/ Disallow: /images/ # Tease vulnerable spiders: User-agent: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA x 350 Disallow: /GoogleBot PAYLOAD :-)/ If you really suffer from arachnophobia, you could block the crawler bot parasite at your router / firewall. NOTE: This text is not trying to encourage the use of Google as an attack tool, acting on information found could be illegal. Yet it is worth considering the ramifications of what data you allow your users to place in their private web directories. Example usage of the Google advanced search features: allinurl: # Find all strings in the url allintitle: # Find all strings in the title filetype: # Find only specific filetype intitle: # Find any strings in the title inurl: # Find any strings in the url link: # Find strings in link site: # Find strings on this site Specific: allinurl:session_login.cgi # Nagios/NetSaint/Nagmin allintitle:/cgi-bin/status.cgi? # Network monioring allintitle:/cgi-bin/extinfo.cgi # Nagios/NetSaint inurl:cpqlogin.htm # Compaq Insight Agent allinurl:"Index of /WEBAGENT/" # Compaq Insight Agent allinurl:/proxy/ssllogin # Compaq Insight Agent allintitle:WBEM Login # Compaq Insight Agent WBEM site:domain.com # Compaq Insight Agent at domain.com inurl:vslogin OR vsloginpage # Visitor System or Vitalnet allintitle:/exchange/root.asp # Exchange Webmail "Index of /exchange/" # Exchange Webmail Directory netopia intitle:192.168 # Netopia Router Config General: site:blah.com filename:cgi # Check cgi source code allinurl:.gov passwd # A blackhat favorite allinurl:.mil passwd # Another blackhat favorite "Index of /admin/" "Index of /cgi-bin/" "Index of /mail/" "Index of /passwd/" "Index of /private/" "Index of /proxy/" "Index of /pls/" "Index of /scripts/" "Index of /tsc/" "Index of /www/" "Index of" config.php OR config.cgi intitle:"index.of /ftp/etc" passwd OR pass -freebsd -netbsd -openbsd intitle:"index.of" passwd -freebsd -netbsd -openbsd inurl:"Index of /backup" inurl:"Index of /tmp" inurl:auth filetype:mdb # MDB files inurl:users filetype:mdb inurl:config filetype:mdb inurl:clients filetype:xls # General spreadsheets inurl:network filetype:vsd # Network diagrams EOF # milw0rm.com [2006-04-08]